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Needle in a haystack 



• Speech is the most important modality of human-human 
communication (~80% of information) ... criminals and 
terrorists are also communicating by speech 

• Speech is easy to acquire in both civilian and 
intelligence/defense scenarios. 

• More difficult is to find what we are looking for 

• Typically done by human experts, but always count on: 

- Limited personnel 

- Limited budget 

- Not enough languages spoken 

- Insufficient security clearances 

Technologies of speech processing are not almighty but can 
help to narrow the search space. 
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“Speech recognition” 



What was said ? 

• Speech recognition 

- Complete transcription - Large Vocabulary Continuous speech 
recognition (LVCSR): transcription, speech to text, S2T. 

- Detection of keywords / keyphrases - keyword spotting (KWS), 
spoken term detection (STD) 

Which language ? 

• Language recognition (LRE), Language identification (LID) 

Who said it ? 

• choose one out of a set of N speakers - speaker identification 

• confirm the claimed identity of a speaker - speaker verification 

• Haven’t heard the speaker before - age ID, gender ID, etc. 
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Speech @ FIT at BUT 









University research 
group established in 
1997 

20 people in 2009 
(faculty, researchers, 
students, support staff). 

Provides also 
education within Dpt. of 
Computer Graphics 
and Multimedia. 

Cooperating with EU 
and US universities 
and companies. 
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Supported by EC, US 
and national projects 



The goal: high profile research in speech theory, algorithms and 
software implementation 
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Focus on evaluations 



• ,,1'm better than the other guys" - not relevant unless the same data and 
evaluation metrics for everyone. 

• NIST - US Government Agency, http://www.nist.gov/speech 

• Regular benchmark campaigns - evaluations - of speech technologies. 

• All participants have the same data and have the same limited time to 
process them and send results to NIST => objective comparison. 

• The results and details of systems are discussed at NIST workshops. 

• Speech@FIT extensively participating in NIST evaluations: 

• Transcription 2005, 2006, 2007, 2009 

• Language ID 2003, 2005, 2007, 2009 (now!) 

• Speaker Verification 1998, 1999, 2006, 2008, 

• Spoken term detection 2006 
Why are we doing this ? 

• We believe that evaluations are really advancing the state of the art 

• Do not want to waste our time on useless work . . . 



National Institute of S? I 

Standards and Technology Smu NIST 



...working with industry to foster innovation , trade, security and jobs 
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Phonexia Ltd 



• Company created in 2006 by 6 
Speech@FIT members 

• Closely cooperating with the 
research group 

• Key people 

- Pavel Matejka, CEO 

- Petr Schwarz, CTO 

- Igor Szoke, CFO 

- Dr. Lukas Burget, research 
coordinator 

- Dr. Jan Cernocky, university 
relations 

- Tomas Kasparek, hardware 
architect 
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Phonexia Language Identification from Spoken Speech Demo 

Our technology will help you to distinguish the language spoken, We will allow you to automatically route the valuable 
call to someone in your company who speaks the language or to a software that can analyze it, and we will make the 
process fast. If you are in the security/defense sector, our language identification system will allow you to dig the most 
of telephone calls. 




Q Find: | 
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I Next Previous Highlight all □ Match case Phrase not found 



The goal: bringing mature technologies to the market, especially in 
the security7aefense sector 
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Not new in the business © 



Speech @ FIT 

• NIST evaluations are 
supported by intelligence 
sponsors in the US. 

• Project sponsored by US 
Air Force EOARD 

• Project supported by 
Czech Ministry of Interior 

• Czech Ministry of 
Education supporting FIT 
BUT under framework 
Droject “Security-Oriented 
Research in Information 
Technology” 



Phonexia 

• Founded based on 
consultations from Czech 
military intelligence. 

• Delivers systems for 
civilian and military 
intelligence since 2006. 

• Customers in 

• Czech Republic 

• Germany 

• Spain 

• Russia 
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Language ID 



Technical approach 

• acoustic 

• phonotactic 
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Research achievements 



ara F 0.0 
eng T 93.3 
far F 0.0 
f re F 0.3 
ger F 4.9 
hin F 0.0 
j ap F 0.0 
kor F 0.0 
man F 1.3 
spa F 0.0 
tam F 0.0 
vie F 0.1 

• NIST LRE 2005- 
Speech@FIT the best in 
2 out of 3 categories 

• NIST LRE 2007- 
confirmation of the 
leading position. 
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Key ideas: 





• Discriminative modeling 

• Gathering training data 
from public sources 
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Products 



Ready to ship: Phonexia LID 

• Application with GUI for sorting of record, 
and command line version 

• Combination of acoustic and phontatic 
approach 

• 12 pre-trained languages 

• Possibility to train new language/model by 
customer 

• Possibility to discriminatively train higher 
quality languages/models by Phonexia 

• API for developers 

Ongoing development 

• Increasing the robustness to adverse 
factors (speaker, acoustic environment, 
channel) 
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Speaker verification 



Technical approach 

• Model of speaker against model of the 
“world” 
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Fighting unwanted variability 
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Let the models move ! 



Target speaker model y es { 
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Research achievements 



NIST SRE08 



SHORT2-SHORT3 
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COMPOSITE 2006 (1conv4w-1conv4w): DET 1 All Trials (Common Test) Primary Systems 



<- NIST SRE 2006: 

• BUT 

• STBU 
consortium 

NIST SRE 2008 4 

Z 

• confirming 
leading position 



False Alarm probability (in %) 




Telephone Speech In Training and Test 



• Min Cost: 0.367044BG5451 
4- Act Cost: 0.401 825582385 
OMin Cost: 0.35972387313 

• Act Cost: 0.76088367694 

• Min Cost: 0.356971671648 
4-Act Cost: 0.358167292439 

• Min Cost: 0.355970298008 
+ Act Cost: 0.38853500925 

• Min Cost: 0.34922439182s 
4- Act Cost: 0.366007434266 

• Min Cost: 0.33734356572 
4- Act Cost: 0.369428017805 

• Min Cost: 0.330125923522 

• Act Cost: 0.347714790711 

OMin Cost: 0.320909B979U 

• Act Cost: 0.337859219187 

• Min Cost: 0.291602987412 

• Act Cost: 0.297456765007 

• Min Cost: 0.286196473555 

• Act Cost: 0.328504B83057 

• Min Cost: 0.2BG138571946 
+ Act Cost: 0.34352574171 

• Min Cost: 0.272202991881 
4 Act Cost: 0.278398718116 



Key ideas: 

• Coping with unwanted variability 

• Compact representation of speakers allowing for 
extremely fast scoring of speech files. 
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Products 



Ready to ship: Phonexia Speaker 
Verification 

• GUI application for speaker search in 
audio archives 

• Command line version and API for 
developers 

Ongoing development 

• More powerful techniques for 
robustness on non-speaker 
information - Joint Factor Analysis. 

• Calibration in different setups (lengths 
of utterances, etc.) to always obtain a 
meaningful score. 
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But what if we did not hear the 

speaker before ? 

Gender ID 

• The easiest speech application to 
deploy ... 

• ... and the most accurate (>96% on 
challenging channels) 

• Limits search space by 50% 

• Available now, standalone or in 
Phonexia Speaker ID 
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Keyword spotting 



Technical approach 

• Comparing keyword model output with an anti-model. 

• Key question: what is the needed tradeoff between 
speed and accuracy? 




Features Likelihoods Confidences 



Acoustic 

© Fast 

© No problem with OOV 

© Can not index - new keyword 
mens new processing of all the 
data 

© Does not have language model 
- problem with short keywords. 



LVCSR 

© once indexed, the search is very 
fast 

© More precise. 

© More complex, recognition is 
slower 

© Limited vocabulary - OOV 



Research achievements 



NIST STD 2006 -English 



MV Task 2008 - Czech 



Prinary English BNEMS Prinary English Out-of -Vocabulary BNEHS 




Key ideas: 

• Expertise with acoustic, word and sub-word recognition 

• Speech indexing and search 

• Normalization of scores. 
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Products 



Ready to ship: Phonexia Acoustic KWS 

• GUI application for keyword spotting in 
incoming files 

• Czech and Russian supported 



Ongoing development 

• Command line version and API for 
developers 

• LVCSR-based KWS for English 
and Czech 

• Other languages - Polish, 
Hungarian, Slovak. 
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System je v klidovem stavu a ceka na pnchozi data . . . 





What is special for ISS public? 



We know you are not working with HiFi... 

• Phonexia Preselector - filtering out DTMF, FAX, ringing tones, 
noises. 

• Channel compensation - coping with irrelevant information. 

We know we will not get your “hot” data... 

• LID: Training new languages by the user 

• SID: Background models trained on publicly available databases. 

• Phonexia application won’t need Internet connection. 

We know you’ll be interested in languages we don’t support 

• Custom development (but costly and long) 

• Language-independent technologies, such as SID 
We know this is not a box-software 

• We respect specifics of each customer 

• We are used to adapt our systems to your data and needs 
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Brno Speech Core 



Shares building 
blocks (source code) 
among all our 
technologies 




Allows for fast 
prototyping of any 
speech application. 

Unified application 
interface enables fast 
and clean integration 
of our technology to 
customers’ systems. 

The API allows to use (and distribute) the technology as 
the whole or in parts 
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Forms of delivery 



Executable software including GUI 
Libraries + models + API 
Combination of both 

Integration in a full speech search system 
Consulting 
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Summary 



Speech @ FIT: 

• Research - academic, but driven by real demands of the 
intelligence community. 

Phonexia: 

• Technology, SDKs 

• Stand alone applications 

• Custom development 

• Maintenance, training, services 

• Consulting 
Together: 

• Serving the intelligence community in making the world a 
safer place. 

Op 

Phanpjtm 



Contacts 



Phonexia, Ltd. http://phonexia.com/ 

Pavel Matejka, CEO, matejka@phonexia.com 
Petr Schwarz, CTO, schwarz@phonexia.com 

Speech@Fll , Brno University of Technology, 
http://speech.fit.vutbr.cz/ 

Jan “Honza” Cernocky, Head of Department, 

cernocky@fit.vutbr.cz 

Thanks for your attention 
Ready for your questions now or in our booth 
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